在本文中,我们提出了一个深度学习框架,该框架为人形机器人步行步态中的腿部接触率检测提供了统一的方法。我们的配方实现了准确,稳健地估计每条腿的接触状态概率(即稳定或滑动/无接触)。所提出的框架采用了仅本体感知感应,尽管它依赖于模拟的基础真相接触数据进行分类过程,但我们证明了它在不同的摩擦表面和不同的腿部机器人平台上概括,同时也很容易地从模拟转移到模拟转移到实践。该框架是通过使用地面真实接触数据在模拟中进行定量和定性评估的,并与ATLA,NAO和TALOS类人类机器人的现状与ART方法形成对比。此外,用真实的talos人类生物生物估计得出了其功效。为了加强进一步的研究努力,我们的实施是作为开源的ROS/Python软件包,即创建的腿部接触检测(LCD)。
translated by 谷歌翻译
A methodology is proposed, which addresses the caveat that line-of-sight emission spectroscopy presents in that it cannot provide spatially resolved temperature measurements in nonhomogeneous temperature fields. The aim of this research is to explore the use of data-driven models in measuring temperature distributions in a spatially resolved manner using emission spectroscopy data. Two categories of data-driven methods are analyzed: (i) Feature engineering and classical machine learning algorithms, and (ii) end-to-end convolutional neural networks (CNN). In total, combinations of fifteen feature groups and fifteen classical machine learning models, and eleven CNN models are considered and their performances explored. The results indicate that the combination of feature engineering and machine learning provides better performance than the direct use of CNN. Notably, feature engineering which is comprised of physics-guided transformation, signal representation-based feature extraction and Principal Component Analysis is found to be the most effective. Moreover, it is shown that when using the extracted features, the ensemble-based, light blender learning model offers the best performance with RMSE, RE, RRMSE and R values of 64.3, 0.017, 0.025 and 0.994, respectively. The proposed method, based on feature engineering and the light blender model, is capable of measuring nonuniform temperature distributions from low-resolution spectra, even when the species concentration distribution in the gas mixtures is unknown.
translated by 谷歌翻译
The two popular datasets ScanRefer [16] and ReferIt3D [3] connect natural language to real-world 3D data. In this paper, we curate a large-scale and complementary dataset extending both the aforementioned ones by associating all objects mentioned in a referential sentence to their underlying instances inside a 3D scene. Specifically, our Scan Entities in 3D (ScanEnts3D) dataset provides explicit correspondences between 369k objects across 84k natural referential sentences, covering 705 real-world scenes. Crucially, we show that by incorporating intuitive losses that enable learning from this novel dataset, we can significantly improve the performance of several recently introduced neural listening architectures, including improving the SoTA in both the Nr3D and ScanRefer benchmarks by 4.3% and 5.0%, respectively. Moreover, we experiment with competitive baselines and recent methods for the task of language generation and show that, as with neural listeners, 3D neural speakers can also noticeably benefit by training with ScanEnts3D, including improving the SoTA by 13.2 CIDEr points on the Nr3D benchmark. Overall, our carefully conducted experimental studies strongly support the conclusion that, by learning on ScanEnts3D, commonly used visio-linguistic 3D architectures can become more efficient and interpretable in their generalization without needing to provide these newly collected annotations at test time. The project's webpage is https://scanents3d.github.io/ .
translated by 谷歌翻译
Natural language interaction is a promising direction for democratizing 3D shape design. However, existing methods for text-driven 3D shape editing face challenges in producing decoupled, local edits to 3D shapes. We address this problem by learning disentangled latent representations that ground language in 3D geometry. To this end, we propose a complementary tool set including a novel network architecture, a disentanglement loss, and a new editing procedure. Additionally, to measure edit locality, we define a new metric that we call part-wise edit precision. We show that our method outperforms existing SOTA methods by 20% in terms of edit locality, and up to 6.6% in terms of language reference resolution accuracy. Our work suggests that by solely disentangling language representations, downstream 3D shape editing can become more local to relevant parts, even if the model was never given explicit part-based supervision.
translated by 谷歌翻译
The current success of machine learning on image-based combustion monitoring is based on massive data, which is costly even impossible for industrial applications. To address this conflict, we introduce few-shot learning in order to achieve combustion monitoring and classification for the first time. Two algorithms, Siamese Network coupled with k Nearest Neighbors (SN-kNN) and Prototypical Network (PN), were tested. Rather than utilizing solely visible images as discussed in previous studies, we also used Infrared (IR) images. We analyzed the training process, test performance and inference speed of two algorithms on both image formats, and also used t-SNE to visualize learned features. The results demonstrated that both SN-kNN and PN were capable to distinguish flame states from learning with merely 20 images per flame state. The worst performance, which was realized by PN on IR images, still possessed precision, accuracy, recall, and F1-score above 0.95. We showed that visible images demonstrated more substantial differences between classes and presented more consistent patterns inside the class, which made the training speed and model performance better compared to IR images. In contrast, the relatively low quality of IR images made it difficult for PN to extract distinguishable prototypes, which caused relatively weak performance. With the entrire training set supporting classification, SN-kNN performed well with IR images. On the other hand, benefitting from the architecture design, PN has a much faster speed in training and inference than SN-kNN. The presented work analyzed the characteristics of both algorithms and image formats for the first time, thus providing guidance for their future utilization in combustion monitoring tasks.
translated by 谷歌翻译
地球天文台是一个不断增长的研究领域,可以在短时间预测(即现在的情况下)利用AI的力量。在这项工作中,我们使用视频变压器网络应对天气预报的挑战。视觉变压器体系结构已在各种应用中进行了探索,主要限制是注意力的计算复杂性和饥饿的培训。为了解决这些问题,我们建议使用视频Swin-Transformer,再加上专用的增强计划。此外,我们在编码器侧采用逐渐的空间减少,并在解码器上进行了交叉注意。在Weather4cast2021天气预报挑战数据中测试了建议的方法,该数据需要从每小时的天气产品序列预测未来的8小时(每小时4个小时)。将数据集归一化为0-1,以促进使用不同数据集的评估指标。该模型在提供训练数据时会导致MSE得分为0.4750,在不使用培训数据的情况下转移学习过程中为0.4420。
translated by 谷歌翻译
数字双胞胎最近对工业控制系统(ICS)的模拟,优化和预测维护产生了重大兴趣。最近的研究讨论了在工业系统中使用数字双胞胎进行入侵检测的可能性。因此,这项研究为工业控制系统的基于数字双胞胎的安全框架做出了贡献,从而扩展了其模拟攻击和防御机制的能力。在独立的开源数字双胞胎上实施了四种类型的过程感知攻击方案:命令注入,网络拒绝服务(DOS),计算的测量修改和天真的测量修改。根据八种监督机器学习算法的离线评估,建议将堆叠的合奏分类器作为实时入侵检测。通过组合各种算法的预测,设计的堆叠模型就F1得分和准确性而言优于先前的方法,同时可以在接近实时(0.1秒)中检测和分类入侵。这项研究还讨论了拟议的基于数字双胞胎的安全框架的实用性和好处。
translated by 谷歌翻译
与非线性二次调节剂(NLQR)问题相关的汉密尔顿 - 雅各比 - 贝尔曼部分微分方程(HJB PDE)的近似的深度学习方法。首先使用了依赖于州的Riccati方程控制法来生成一个梯度调制的合成数据集,以进行监督学习。根据HJB PDE的残差,最小化损耗函数的最小化成为一个温暖的开始。监督学习和残留最小化的结合避免了虚假解决方案,并减轻了仅监督学习方法的数据效率低下。数值测试验证了所提出的方法的不同优势。
translated by 谷歌翻译
增量任务学习(ITL)是一个持续学习的类别,试图培训单个网络以进行多个任务(一个接一个),其中每个任务的培训数据仅在培训该任务期间可用。当神经网络接受较新的任务培训时,往往会忘记旧任务。该特性通常被称为灾难性遗忘。为了解决此问题,ITL方法使用情节内存,参数正则化,掩盖和修剪或可扩展的网络结构。在本文中,我们提出了一个基于低级别分解的新的增量任务学习框架。特别是,我们表示每一层的网络权重作为几个等级1矩阵的线性组合。为了更新新任务的网络,我们学习一个排名1(或低级别)矩阵,并将其添加到每一层的权重。我们还引入了一个其他选择器向量,该向量将不同的权重分配给对先前任务的低级矩阵。我们表明,就准确性和遗忘而言,我们的方法的表现比当前的最新方法更好。与基于情节的内存和基于面具的方法相比,我们的方法还提供了更好的内存效率。我们的代码将在https://github.com/csiplab/task-increment-rank-update.git上找到。
translated by 谷歌翻译
现有的唱歌语音合成模型(SVS)通常在唱歌数据上进行训练,并取决于容易出错的时间对齐和持续时间功能或明确的音乐得分信息。在本文中,我们提出了Karaoker,Karaoker是一种基于多言式Tacotron的模型,该模型以语音特征为条件,该功能专门针对口语数据进行训练,而无需时间对齐。卡拉克(Karaoker)在从看不见的歌手/扬声器的源波形中提取的多维模板之后,综合了歌声和传输风格。该模型在连续数据上以单个深卷积编码为共同条件,包括音高,强度,和谐,实扣,cepstral峰值突出和八度。我们通过功能重建,分类和说话者身份识别任务扩展了文本到语音训练目标,这些任务将模型指导到准确的结果。除多任务外,我们还采用了Wasserstein GAN训练方案以及声学模型的输出的新损失,以进一步完善模型的质量。
translated by 谷歌翻译